Constructing Minimal Spanning Tree Based on Rough Set Theory for Gene Selection
نویسندگان
چکیده
Microarray gene dataset often contains high dimensionalities which cause difficulty in clustering and classification. Datasets containing huge number of genes lead to increased complexity and therefore, degradation of dataset handling performance. Often, all the measured features of these high-dimensional datasets are not relevant for understanding the underlying phenomena of interest. Dimensionality reduction by reduct generation is hence performed as an important step before clustering and classification. The reduced attribute set has the same characteristics as the entire set of attributes in the information system. In this paper, a new attribute reduction technique, based on directed minimal spanning tree and rough set theory is done, for unsupervised learning. The method, firstly, computes a similarity factor between each pair of attributes using indiscernibility relation, a concept of rough set theory. Based on the similarity factors, an attribute similarity set is formed from which a directed weighted graph with vertices as attributes and edge weights as the inverse of the similarity factor is constructed. Then, all possible minimal spanning trees of the graph are generated. From each tree, iteratively, the most important vertex is included in the reduct set and all its out-going edges are removed. The process stops when the edge set is empty, thus producing multiple reducts. The proposed method and some well-known attribute reduction techniques have been applied on several microarray gene datasets for gene selection. The results obtained show the effectiveness of the method.
منابع مشابه
Rough Set Based Approach to Selection of Node
Decision tree is widely used in machine learning. In the process of constructing a tree, appropriate attributes have to be selected as nodes of the tree based on some criteria. There are several approaches to selection of attributes. In this paper, we present a new approach to selection of attributes for construction of decision tree based on rough set theory. The basic idea is, if the size of ...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملMultispectral image segmentation using the rough-set-initialized EM algorithm
The problem of segmentation of multispectral satellite images is addressed. An integration of rough-set-theoretic knowledge extraction, the Expectation Maximization (EM) algorithm, and minimal spanning tree (MST) clustering is described. EM provides the statistical model of the data and handles the associated measurement and representation uncertainties. Rough-set theory helps in faster converg...
متن کاملNon-convex clustering using expectation maximization algorithm with rough set initialization
An integration of a minimal spanning tree (MST) based graph-theoretic technique and expectation maximization (EM) algorithm with rough set initialization is described for non-convex clustering. EM provides the statistical model of the data and handles the associated uncertainties. Rough set theory helps in faster convergence and avoidance of the local minima problem, thereby enhancing the perfo...
متن کاملEvaluation of Rough Set Theory for Decision Making of rehabilitation Method for Concrete Pavement
In recent years a great number of advanced theoretical - empirical methods has been developed for design & modeling concrete pavements distress. But there is no reliable theoretical method to be use in evaluation of conerete pavements distresses and making a decision about repairing them. Only empirical methods is used for this reason. One of the most usual methods in evaluating concrete paveme...
متن کامل